linguistic feature
Visualizing and Measuring the Geometry of BERT
Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.
LLMs Know More Than Words: A Genre Study with Syntax, Metaphor & Phonetics
Shi, Weiye, Zhang, Zhaowei, Yan, Shaoheng, Yang, Yaodong
Large language models (LLMs) demonstrate remarkable potential across diverse language-related tasks, yet whether they capture deeper linguistic properties--such as syntactic structure, phonetic cues, and metrical patterns--from raw text remains unclear. To analysis whether LLMs can learn these features effectively and apply them to important nature language related tasks, we introduce a novel multilingual genre classification dataset derived from Project Gutenberg, a large-scale digital library offering free access to thousands of public domain literary works, comprising thousands of sentences per binary task (poetry vs. novel; drama vs. poetry; drama vs. novel) in six languages (English, French, German, Italian, Spanish, and Portuguese). We augment each with three explicit linguistic feature sets (syntactic tree structures, metaphor counts, and phonetic metrics) to evaluate their impact on classification performance. Experiments demonstrate that although LLM classifiers can learn latent linguistic structures either from raw text or from explicitly provided features, different features contribute unevenly across tasks, which underscores the importance of incorporating more complex linguistic signals during model training.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China > Beijing > Beijing (0.04)
Signature vs. Substance: Evaluating the Balance of Adversarial Resistance and Linguistic Quality in Watermarking Large Language Models
Guo, William, Uchendu, Adaku, Smith, Ana
To mitigate the potential harms of Large Language Models (LLMs)generated text, researchers have proposed watermarking, a process of embedding detectable signals within text. With watermarking, we can always accurately detect LLM-generated texts. However, recent findings suggest that these techniques often negatively affect the quality of the generated texts, and adversarial attacks can strip the watermarking signals, causing the texts to possibly evade detection. These findings have created resistance in the wide adoption of watermarking by LLM creators. Finally, to encourage adoption, we evaluate the robustness of several watermarking techniques to adversarial attacks by comparing paraphrasing and back translation (i.e., English $\to$ another language $\to$ English) attacks; and their ability to preserve quality and writing style of the unwatermarked texts by using linguistic metrics to capture quality and writing style of texts. Our results suggest that these watermarking techniques preserve semantics, deviate from the writing style of the unwatermarked texts, and are susceptible to adversarial attacks, especially for the back translation attack.
- North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois > Kane County > Aurora (0.04)
LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data
Zolnour, Ali, Azadmaleki, Hossein, Haghbin, Yasaman, Taherinezhad, Fatemeh, Nezhad, Mohamad Javad Momeni, Rashidi, Sina, Khani, Masoud, Taleban, AmirSajjad, Sani, Samin Mahdizadeh, Dadkhah, Maryam, Noble, James M., Bakken, Suzanne, Yaghoobzadeh, Yadollah, Vahabie, Abdol-Hossein, Rouhizadeh, Masoud, Zolnoori, Maryam
Alzheimer's disease and related dementias(ADRD) affect nearly five million older adults in the United States, yet more than half remain undiagnosed. Speech-based natural language processing(NLP) offers a scalable approach for detecting early cognitive decline through subtle linguistic markers that may precede clinical diagnosis. This study develops and evaluates a speech-based screening pipeline integrating transformer embeddings with handcrafted linguistic features, synthetic augmentation using large language models(LLMs), and benchmarking of unimodal and multimodal classifiers. External validation assessed generalizability to a MCI-only cohort. Transcripts were drawn from the ADReSSo 2021 benchmark dataset(n=237, Pitt Corpus) and the DementiaBank Delaware corpus(n=205, MCI vs. controls). Ten transformer models were tested under three fine-tuning strategies. A late-fusion model combined embeddings from the top transformer with 110 linguistic features. Five LLMs(LLaMA8B/70B, MedAlpaca7B, Ministral8B,GPT-4o) generated label-conditioned synthetic speech for augmentation, and three multimodal LLMs(GPT-4o,Qwen-Omni,Phi-4) were evaluated in zero-shot and fine-tuned modes. On ADReSSo, the fusion model achieved F1=83.3(AUC=89.5), outperforming transformer-only and linguistic baselines. MedAlpaca7B augmentation(2x) improved F1=85.7, though larger scales reduced gains. Fine-tuning boosted unimodal LLMs(MedAlpaca7B F1=47.7=>78.7), while multimodal models performed lower (Phi-4=71.6;GPT-4o=67.6). On Delaware, the fusion plus 1x MedAlpaca7B model achieved F1=72.8(AUC=69.6). Integrating transformer and linguistic features enhances ADRD detection. LLM-based augmentation improves data efficiency but yields diminishing returns, while current multimodal models remain limited. Validation on an independent MCI cohort supports the pipeline's potential for scalable, clinically relevant early screening.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Context is Enough: Empirical Validation of $\textit{Sequentiality}$ on Essays
Sunny, Amal, Gupta, Advay, Sreekumar, Vishnu
Recent work has proposed using Large Language Models (LLMs) to quantify narrative flow through a measure called sequentiality, which combines topic and contextual terms. A recent critique argued that the original results were confounded by how topics were selected for the topic-based component, and noted that the metric had not been validated against ground-truth measures of flow. That work proposed using only the contextual term as a more conceptually valid and interpretable alternative. In this paper, we empirically validate that proposal. Using two essay datasets with human-annotated trait scores, ASAP++ and ELLIPSE, we show that the contextual version of sequentiality aligns more closely with human assessments of discourse-level traits such as Organization and Cohesion. While zero-shot prompted LLMs predict trait scores more accurately than the contextual measure alone, the contextual measure adds more predictive value than both the topic-only and original sequentiality formulations when combined with standard linguistic features. Notably, this combination also outperforms the zero-shot LLM predictions, highlighting the value of explicitly modeling sentence-to-sentence flow. Our findings support the use of context-based sequentiality as a validated, interpretable, and complementary feature for automated essay scoring and related NLP tasks.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
Enabling Automatic Self-Talk Detection via Earables
Lee, Euihyeok, Kim, Seonghyeon, Im, SangHun, Oh, Heung-Seon, Kang, Seungwoo
Self-talk-an internal dialogue that can occur silently or be spoken aloud-plays a crucial role in emotional regulation, cognitive processing, and motivation, yet has remained largely invisible and unmeasurable in everyday life. In this paper, we present MutterMeter, a mobile system that automatically detects vocalized self-talk from audio captured by earable microphones in real-world settings. Detecting self-talk is technically challenging due to its diverse acoustic forms, semantic and grammatical incompleteness, and irregular occurrence patterns, which differ fundamentally from assumptions underlying conventional speech understanding models. To address these challenges, MutterMeter employs a hierarchical classification architecture that progressively integrates acoustic, linguistic, and contextual information through a sequential processing pipeline, adaptively balancing accuracy and computational efficiency. We build and evaluate MutterMeter using a first-of-its-kind dataset comprising 31.1 hours of audio collected from 25 participants. Experimental results demonstrate that MutterMeter achieves robust performance with a macro-averaged F1 score of 0.84, outperforming conventional approaches, including LLM-based and speech emotion recognition models.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Nepal (0.04)
- North America > United States (0.04)
- (3 more...)
- Leisure & Entertainment > Sports > Tennis (1.00)
- Health & Medicine > Consumer Health (1.00)
- Education (0.93)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)
Analyzing and Mitigating Negation Artifacts using Data Augmentation for Improving ELECTRA-Small Model Accuracy
Pre - trained models for natural language inference (NLI) often achieve high performance on benchmark datasets by using spurious correlations, or dataset artifacts, rather than understanding language touches such as negation. In this project, we investigate the performance of an ELECTRA - small model fine - tuned on the Stanford Natural Language Inference (SNLI) dataset, focusing on its handling of negation. Through analysis, we identify that the model struggles with correctly classifying examples containing nega tion. To address this, we augment the training data with contrast sets and adversarial examples emphasizing negation. Our results demonstrate that this targeted data augmentation improves the model's accuracy on negation - containing examples without adverse ly affecting overall performance, therefore mitigating the identified dataset artifact.
"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue
Ngo, Anh, Rollet, Nicolas, Pelachaud, Catherine, Clavel, Chloe
Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)
Irish-BLiMP: A Linguistic Benchmark for Evaluating Human and Language Model Performance in a Low-Resource Setting
McGiff, Josh, Tran, Khanh-Tung, Mulcahy, William, Luinín, Dáibhidh Ó, Dalzell, Jake, Bhroin, Róisín Ní, Burke, Adam, O'Sullivan, Barry, Nguyen, Hoang D., Nikolov, Nikola S.
We present Irish-BLiMP (Irish Benchmark of Linguistic Minimal Pairs), the first dataset and framework designed for fine-grained evaluation of linguistic competence in the Irish language, an endangered language. Drawing on a variety of linguistic literature and grammar reference works, we manually constructed and reviewed 1020 minimal pairs across a taxonomy of 11 linguistic features, through a team of fluent Irish speakers. We evaluate both existing Large Language Models (LLMs) and fluent human participants on their syntactic knowledge of Irish. Our findings show that humans outperform all models across all linguistic features, achieving 16.6% higher accuracy on average. Moreover, a substantial performance gap of 18.1% persists between open- and closed-source LLMs, with even the strongest model (gpt-5) reaching only 73.5% accuracy compared to 90.1% by human. Interestingly, human participants and models struggle on different aspects of Irish grammar, thus highlighting a difference in representation learned by the models. Overall, Irish-BLiMP provides the first systematic framework for evaluating the grammatical competence of LLMs in Irish and offers a valuable benchmark for advancing research on linguistic understanding in low-resource languages.
- Europe > Ireland > Munster > County Limerick > Limerick (0.04)
- Europe > Ireland > Munster > County Cork > Cork (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Transportation > Passenger (0.85)
- Transportation > Air (0.85)
Individualized Cognitive Simulation in Large Language Models: Evaluating Different Cognitive Representation Methods
Zhang, Tianyi, Zhou, Xiaolin, Wang, Yunzhe, Cambria, Erik, Traum, David, Mao, Rui
Individualized cognitive simulation (ICS) aims to build computational models that approximate the thought processes of specific individuals. While large language models (LLMs) convincingly mimic surface-level human behavior such as role-play, their ability to simulate deeper individualized cognitive processes remains poorly understood. To address this gap, we introduce a novel task that evaluates different cognitive representation methods in ICS. We construct a dataset from recently published novels (later than the release date of the tested LLMs) and propose an 11-condition cognitive evaluation framework to benchmark seven off-the-shelf LLMs in the context of authorial style emulation. We hypothesize that effective cognitive representations can help LLMs generate storytelling that better mirrors the original author. Thus, we test different cognitive representations, e.g., linguistic features, concept mappings, and profile-based information. Results show that combining conceptual and linguistic features is particularly effective in ICS, outperforming static profile-based cues in overall evaluation. Importantly, LLMs are more effective at mimicking linguistic style than narrative structure, underscoring their limits in deeper cognitive simulation. These findings provide a foundation for developing AI systems that adapt to individual ways of thinking and expression, advancing more personalized and human-aligned creative technologies.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York (0.04)
- (7 more...)